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(54) Method for generating personality patterns and for synthesizing speech 



(57) To mimic the speaking behavior of a given 
speaker, a method for generating personality patterns 



in particular for synthesizing speech is proposed in 
which acoustical as well as non-acoustical speech fea- 
tures (SF) are extracted from a given speech input (SI). 
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Description 

[0001] The present invention relates to a method for 
generating personality patterns and to a method for syn- 
thesizing speech. 

[0002] Nowadays, a large variety of equipment and 
appliances employ man-machine dialogue systems to 
ensure an easy and reliable use by a human user. These 
man-machine dialogue systems are enabled to receive 
and consider users' utterances, in particular orders and/ 
or inquiries, and to react and respond in an appropriate 
way. Nevertheless, current speech synthesis systems 
involved in such man-machine dialogue systems suffer 
from a lack of personality and naturalness. Although the 
systems are enabled to deal with the context of the sit- 
uation in an appropriate way, the prepared and output 
speech of the dialogue system often sounds monotoni- 
cally, machine-like, and not embedded into the particu- 
lar situation. 

[0003] It is an object of the present invention to pro- 
vide a method for generating personality patterns in par- 
ticular for synthesizing speech and a method for synthe- 
sizing speech in which naturalness of the speech and 
its features can be realized. 

[0004] The object is achieved by a method for gener- 
ating personality patterns, in particular for synthesizing 
speech, with the features of claim 1 . Furtheron, the ob- 
ject is achieved by a method for synthesizing speech 
according to the characterizing features of claim 11 . A 
system and a computer program product for carrying out 
the inventive methods are the subject-matter of claims 
14 and 15, respectively. Preferred embodiments of the 
inventive methods are within the scope of the dependent 
subclaims. 

[0005] In the inventive method for generating person- 
ality patterns, in particular for synthesizing speech, a 
speech input is received and/or preprocessed. From the 
speech input acoustical and/or non-acoustical speech 
features are extracted. Based on the extracted speech 
features and/or on models and/or parameters thereof, 
a personality pattern is generated and/or stored. 
[0006] It is therefore a basic idea of the present inven- 
tion to extract acoustical and alternatively or simultane- 
ously non-acoustical speech features from a received 
speech input. The speech features are then directly or 
indirectly used to construct a personality pattern which 
can lateron be used to reconstruct a speech output with 
the mimic of the speech input and its speaker. The 
speech features are therefore parameterized or mod- 
eled and included or described in certain models or 
units. 

[0007] According to an embodiment of the inventive 
method for generating personality patterns, online input 
speech and/or speech of a speech data base for at least 
one given speaker are used for receiving said speech 
input. Using a speech data base enables a system in- 
volving the inventive method to generate the personality 
patterns in advance of an application. That means that, 



before the system is applied for example in an speech 
synthesizing unit, a speech model for a single speaker 
or for a variety of speakers can be constructed. Within 
the application of the inventive method it is also possible 

5 to construct the personality patterns during the applica- 
tion in a speech synthesizing unit in a real time or online 
manner, so as to adapt a speech output generated in a 
dialogue system during the application and/or during the 
dialogue with the user. 

10 [0008] It is an aspect of the present invention to use 
a large variety of features from the speech input so as 
to model the personality patterns as good as possible 
to achieve in an application of a dialogue system a par- 
ticular natural responding speech output. 

15 [0009] It is therefore an aspect of a further embodi- 
ment of the present invention to use prosodic features, 
voice quality features, global statistic and/or spectral 
properties, and/or the like as acoustical features. 
[0010] Within the class of prosodic features, pitch, 

20 pitch range, intonation attitude, loudness, speaking rate, 
phone duration, speech element duration features, and 
or the like can be employed. 

[0011] Within the class of voice quality features, pho- 
nation type, articulation manner, voice timbre features, 

25 and/or the like can be employed. 

[0012] In the class of non-acoustical features, contex- 
tual features and/or the like may be important in accord- 
ance to a further advantageous embodiment of the 
present invention. In particular, syntactical, grammati- 

30 cal, semantical features, and/or the like can be used as 
contextual features. 

[0013] As a human speaker has distinct preferences 
in constructing sentences, phrases, word combinations, 
and/or the like, according to a further preferred embod- 

35 iment of the present invention within the class of non- 
acoustical features statistical features on the usage, dis- 
tribution, and/or probability of speech elements - such 
as words, subword units, syllables, phonemes, phones, 
and/or the like - and/or combinations of them within said 

40 speech input can be used. Additional sentence, phrase, 
word combination preferences can be evaluated and in- 
cluded into said personality pattern. 
[0014] To prepare for the extraction of contextual fea- 
tures or the like, a process of speech recognition is pref- 

45 erably carried out within the inventive method. 

[0015] Alternatively or additionally, a process of 
speaker identification and/or adaptation can be per- 
formed, in particular so as to increase the matching rate 
of the feature extraction and/or of the recognition rate 

50 of the process of speech recognition. 

[0016] In the inventive method for synthesizing 
speech, in particular for a man-machine dialogue sys- 
tem, the inventive method for generating personality 
patterns is employed. 

55 [0017] According to a further embodiment of the in- 
ventive method for synthesizing speech, the method for 
generating personality patterns is essentially carried out 
in a preprocessing step, in particular based on a speech 
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data base or the like. 

[0018] Alternatively or additionally, the method for 
generating personality patterns can be carried out and/ 
or continued in a continuous, real time, or online man- 
ner. This enables a system involving said method for 
synthesizing speech to adapt its speech output in ac- 
cordance to the received input during the dialogue. 
[0019] Both of the methods for generating personality 
patterns and/or for synthesizing speech can be config- 
ured to create a personality pattern or a speech output 
which is in some sense complementary to the person- 
ality pattern or character assigned to the speaker of the 
speech input. That means, for instance, that in the case 
of an emergency call system for activating ambulance 
or fire alarm services the speaker of the speech input 
might be excited and/or confused. It might therefore be 
necessary to calm down the speaking person and this 
can be achieved by creating a personality pattern for the 
speech synthesis reflecting a strong and confident and 
safe character. Additionally, it might also be possible to 
construct personality patterns for the synthesized 
speech output which reflects a gender which is comple- 
mentary to the gender of the speaker of the speech in- 
put, i. e. in the case of a male speaker, the system might 
respond as a female speaker so as to make the dialogue 
most convenient for the speaking person. 
[0020] It is a further aspect of the present invention to 
provide a system, an apparatus, a device, and/or the 
like for generating personality patterns and/or for syn- 
thesizing speech which is in each case capable of per- 
forming and/or realizing the inventive methods for gen- 
erating personality patterns and/or for synthesizing 
speech and/or its steps. 

[0021] According to a further aspect of the present in- 
vention, a computer program product is provided, com- 
prising computer program means which is adapted to 
perform and/or to realize the inventive method for gen- 
erating personality patterns and/or for synthesizing 
speech and/or the steps thereof when it is executed on 
a computer, a digital signal processing means, and/or 
the like. 

[0022] The aspects of the present invention will be- 
come more elucidated taking into account the following 
remarks: 

[0023] After the identification of a speaker, both his 
relevant voice quality features and his speech itself - as 
described by any units, such as words, syllables, di- 
phones, sentences, and/or the like - is automatically ex- 
tracted according to the invention. Also information 
about preferred sentence structure and word usage are 
extracted and used to create a speech synthesis system 
with those characteristics in a completely unsupervised 
way. 

[0024] The starting point for these inventive concepts 
is the lack of personality of current speech synthesis 
systems. Prior art systems are developed with text-to- 
speech (TTS) operation in mind, where intelligibility and 
naturalness of speech is the most important. For dia- 



logue systems, however, the personality of the dialogue 
partner is essential, too. Depending on the personality 
of the artificial dialogue partner, the speaker may be in- 
terested in continuation of the dialogue or not. Thus, 
5 adding a personality pattern to the speech generated by 
the device may be crucial for the success of the dialogue 
device. 

[0025] Therefore, it is proposed to collect and store 
all information about speaking style of the person mak- 
10 ing conversation with the system or device and to use 
said information to modify the speaking style of the de- 
vice. 

[0026] The proposed methods can be used to mimic 
the actual speaker talking to the device but also to equip 
15 the device with some different personalities, e. g. gath- 
ered from the speaking style of famous people, movie 
stars, or the like. This can be very attractive for potential 
customers. The proposed system can be used not only 
to mimic speaker's behavior but more generally to con- 
20 trol the dialogue depending on changing speaking style 
and emotions of the human partner. 
[0027] The collection of features describing the 
speaker's personality can be done on different levels 
during the conversation of the human by a dialogue unit. 
25 In order to mimic the speaker's voice, the speech signal 
has to be recorded and segmented into phones, di- 
phones, and/or into other speech units or speech ele- 
ments in dependence on the speech synthesis method 
used in the system. 
30 [0028] Prosodic features like pitch, pitch range, atti- 
tude of sentence intonation (monotonous or effected), 
loudness, speaking rate, durations of phones, and/or 
the like can be collected to characterize the speaker's 
prosody. 

35 [0029] Voice quality features like phonation type, ar- 
ticulation manner, voice timbre, and/or the like can be 
automatically extracted from the collected speech data. 
[0030] Speaker identification or a speaker identifica- 
tion module are necessary for a proper function of the 
40 system. 

[0031] The system can also collect all the words rec- 
ognized from the adherences spoken by the speaker 
and to generate and evaluate statistics on the usage. 
This can be used to find the most frequent phrases, 

45 words used by a given speaker, and/or the like. Also syn- 
tactic information gathered from the recognized phrases 
can enhance the quality of personality description. 
[0032] After all necessary information has been col- 
lected, the dialogue system can adjust parameters and 

so units of acoustic output - for example the synthesized 
waveforms or the like - and modes of text generation to 
suite the recognized speaker's characteristic. 
[0033] The parameterized personality can be stored 
for future use or can be preprogrammed in the dialogue 

55 device. The information can be used to recognize 
speakers and to change the personality of the system 
depending on the user's preference or mood, for exam- 
ple in case of a system with a built-in emotion recogni- 
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tion engine. 

[0034] The personality can be changed according to 
the user's wish, preprogrammed sequence or depend- 
ing on changing speaker's style and emotions of the 
speaker. 

[0035] The main advantage of such a system is the 
possibility to adapt the dialogue to the given speaker, 
make the dialogue more attractive, and/or the like. The 
possibility to mimic certain speakers or to switch be- 
tween different personalities or speaking styles can be 
very entertaining and attractive for the user. 
[0036] In the following, further advantages and as- 
pects of the present invention will be described taking 
reference to the accompanying figure. 

Fig. 1 is a schematical block diagram describing a 
preferred embodiment of a method for synthe- 
sizing speech employing an embodiment of 
the inventive method for generating personal- 
ity patterns. 

[0037] The schematical block diagram of Fig. 1 shows 
a preferred embodiment of the inventive method for a 
synthesizing speech employing an embodiment of the 
inventive method forgenerating personality pattern from 
a given received speech input SI. 
[0038] In step S1, speech input S1 is received. In a 
first section S1 Oof the inventive method for synthesizing 
speech, non-acoustic features are extracted from the re- 
ceived speech input SI. In a second section S20 of the 
inventive method for synthesizing speech, acoustical 
features are extracted from the received speech input 
SI. The sections S10 and S20 can be performed paral- 
lely or sequentially on a given device or apparatus. 
[0039] In the first section S10 for extracting non- 
acoustical features from the speech input S1 in a first 
step S11, speech parameters are extracted from said 
speech input SI. In a second step S12, the speech input 
S1 is fed intoaspeech recognizerto analyze the content 
and the context of the received speech input SI. 
[0040] Based on the recognition result, in a following 
step S13 contextual features are extracted from said 
speech input S1, in particular syntactical, semantical, 
grammatical, and statistical information on particular 
speech elements are obtained. 
[0041] In the embodiment of Fig. 1 , the second sec- 
tion S20 of the inventive method for synthesizing speech 
consists of three steps S21, S22, and S23 to be per- 
formed independently from each other. 
[0042] In the first step S21 of the second section S20 
for extracting acoustical features, prosodic features are 
extracted from the received speech input SI. Said pro- 
sodic feature may comprise features of pitch, pitch 
range, intonation attitude, loudness, speaking rate, 
speech element duration, and/or the like. 
[0043] In a second step S22, voice quality features 
are extracted from the given received speech input SI, 
for instance phonation type, articulation manner, voice 



timbre features, and/or the like. 
[0044] Finally, in a third and final step S23 of the sec- 
ond section S20, statistical/spectral features are ex- 
tracted from the given speech input SI. 
5 [0045] The non-acoustical features and the acoustical 
features obtained from sections S10 and S20 are 
merged in a following postprocessing step S30 to de- 
tect, model, and store a personality pattern PP for the 
given speaker. 

10 [0046] The data describing the personality pattern PP 
for the current speaker are fed into a following step S40 
which includes the steps of speech synthesis, text gen- 
eration, and dialogue managing from which a respon- 
sive speech output SO is generated and then output in 

15 a final step S50. 



Claims 

1. Method for generating personality patterns, in par- 
ticular for synthesizing speech, wherein: 

- speech input (SI) is received and/or preproc- 
essed, 

acoustical and/or non-acoustical speech fea- 
tures (SF) are extracted from said speech input 
(SI), 

based on the extracted speech features (SF) or 
on models or parameters thereof a personality 
pattern (PP) is generated and/or stored. 

2. Method according to claim 1, wherein online input 
speech and/or speech of a speech data base for at 
least one given speaker are used for receiving said 
speech input (SI). 

3. Method according to anyone of the preceding 
claims, wherein prosodic features, voice quality fea- 
tures, global statistical, and/or spectral properties, 
and/or the like are used as acoustical features. 

4. Method according to claim 3, wherein pitch, pitch 
range, intonation attitude, loudness, speaking rate, 
phone duration, speech element duration features, 
and/or the like are used as prosodic features. 

5. Method according to anyone of the claims 3 or 4, 
wherein phonation type, articulation manner, voice 
timbre features, and/or the like are used as voice 
quality features. 

6. Method according to anyone of the preceding 
claims, wherein contextual features, and/or the like 
are used as said non-acoustical features. 

7. Method according to claim 6, wherein syntactical, 
grammatical, semantical features, and/or the like 
are used as contextual features. 
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8. Method according to anyone of the claims 6 or 7, 
wherein statistical features on the usage, distribu- 
tion, and/or probability of speech elements - such 
as words, subword units, syllables, phonemes, 
phones, and/or the like - and/or combinations of 5 
them within said speech input (SI) are used as non- 
acoustical features. 

9. Method according to anyone of the preceding 
claims, wherein a process of speech recognition is 10 
carried out, in particular to prepare the extraction of 
contextual features and/or the like. 

10. Method according to anyone of the preceding 
claims, wherein a process of speaker identification 15 
and/or adaptation is performed, in particular so as 

to increase the matching rate of the feature extrac- 
tion and/or of the recognition rate of the process of 
speech recognition. 

20 

1 1 . Method for synthesizing speech, in particular for a 
man-machine dialogue system, wherein the meth- 
od for generating personality patterns according to 
anyone of the claims 1 to 10 is employed. 

25 

12. Method according to claim 11, wherein the method 
for generating personality patterns is essentially 
carried out in a preprocessing step, in particular 
based on a speech data base or the like. 

13. Method according to anyone of the claims 11 or 12, 
wherein the method for generating personality pat- 
terns is carried out and/or continued in a continu- 
ous, real time, or online manner. 

35 

14. System for generating personality patterns and/or 
for synthesizing speech which is capable of per- 
forming and/or realizing the method for generating 
personality patterns according to anyone of the 
claims 1 to 10 and/or the method for synthesizing 40 
speech according to anyone of the claims 11 to 13 
and/or the steps thereof. 

15. Computer program product, comprising computer 
program means adapted to perform and/or to real- 45 
ize the method for generating personality patterns 
according to anyone of the claims 1 to 1 0 and/or the 
method for synthesizing speech according to any- 
one of the claims 11 to 13 and/or the steps thereof 
when it is executed on a computer, a digital signal so 
processing means, and/or the like. 
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